Chapter 7 Omit some noise points for more cluster clarity

We could reduce the noise on the plot by omitting some of the points with high outlier scores, but generally I hate doing this because it can be a good way to accidently lose something you didn’t know you wanted. However, it could have it’s advantages as a strategy and the outlier_score of hdbscan() is a nice threshold to play with for further analytical paths.

Full Page Rendering

index_subset = abs(svd_ump$layout[,1]) <20 & abs(svd_ump$layout[,2]) <20 & clus$outlier_scores<0.6
data_subset = svd_ump$layout[index_subset,]
raw_text_subset = raw_text[index_subset]
head_subset = head[index_subset]
clusters = factor(clus$cluster[index_subset])

fig <- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
  add_trace(
    x = data_subset[,1],
    y = data_subset[,2],
    text = ~paste('Heading:', head_subset ,"$<br>Text: ", raw_text_subset ,"$<br>Cluster Number: ", clusters),
    hoverinfo = 'text',
    color = clusters,
    showlegend = F
  )
fig
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
#saveWidget(fig, "All_clusters_noTopics_UMAPClus.html")